Search CORE

7 research outputs found

A novel NMF-HMM speech enhancement algorithm based on Poisson mixture model

Author: Christensen Mads Græsbøll
Højfeldt Rasmussen Morten
Lisby Højvang Jesper
Shi Liming
Xiang Yang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 11/06/2021
Field of study

VBN

An NMF-HMM Speech Enhancement Method based on Kullback-Leibler Divergence

Author: Christensen Mads Græsbøll
Højfeldt Rasmussen Morten
Lisby Højvang Jesper
Shi Liming
Xiang Yang
Publication venue
Publication date: 22/10/2020
Field of study

Crossref

VBN

A speech enhancement algorithm based on a non-negative hidden Markov model and Kullback-Leibler divergence

Author: Christensen Mads Græsbøll
Højvang Jesper Lisby
Rasmussen Morten Højfeldt
Shi Liming
Xiang Yang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 08/09/2022
Field of study

VBN

A deep representation learning speech enhancement method using ß-VAE

Author: Christensen Mads Græsbøll
Højvang Jesper Lisby
Rasmussen Morten Højfeldt
Xiang Yang
Publication venue
Publication date: 01/01/2022
Field of study

VBN

A BAYESIAN PERMUTATION TRAINING DEEP REPRESENTATION LEARNING METHOD FOR SPEECH ENHANCEMENT WITH VARIATIONAL AUTOENCODER

Author: Christensen Mads Græsbøll
Højvang Jesper Lisby
Rasmussen Morten Højfeldt
Xiang Yang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2022
Field of study

Recently, variational autoencoder (VAE), a deep representation learning (DRL) model, has been used to perform speech enhancement (SE). However, to the best of our knowledge, current VAE-based SE methods only apply VAE to the model speech signal, while noise is modeled using the traditional non-negative matrix factorization (NMF) model. One of the most important reasons for using NMF is that these VAE-based methods cannot disentangle the speech and noise latent variables from the observed signal. Based on Bayesian theory, this paper derives a novel variational lower bound for VAE, which ensures that VAE can be trained in supervision, and can disentangle speech and noise latent variables from the observed signal. This means that the proposed method can apply the VAE to model both speech and noise signals, which is totally different from the previous VAE-based SE works. More specifically, the proposed DRL method can learn to impose speech and noise signal priors to different sets of latent variables for SE. The experimental results show that the proposed method can not only disentangle speech and noise latent variables from the observed signal but also obtain a higher scale-invariant signal-to-distortion ratio and speech quality score than the similar deep neural network-based (DNN) SE method.Comment: Accepted by ICASSP 202

arXiv.org e-Print Archive

VBN

A deep representation learning speech enhancement method using $\beta$ -VAE

Author: Christensen Mads Græsbøll
Højvang Jesper Lisby
Rasmussen Morten Højfeldt
Xiang Yang
Publication venue
Publication date: 01/01/2022
Field of study

In previous work, we proposed a variational autoencoder-based (VAE) Bayesian permutation training speech enhancement (SE) method (PVAE) which indicated that the SE performance of the traditional deep neural network-based (DNN) method could be improved by deep representation learning (DRL). Based on our previous work, we in this paper propose to use

\beta

-VAE to further improve PVAE's ability of representation learning. More specifically, our

\beta

-VAE can improve PVAE's capacity of disentangling different latent variables from the observed signal without the trade-off problem between disentanglement and signal reconstruction. This trade-off problem widely exists in previous

\beta

-VAE algorithms. Unlike the previous

\beta

-VAE algorithms, the proposed

\beta

-VAE strategy can also be used to optimize the DNN's structure. This means that the proposed method can not only improve PVAE's SE performance but also reduce the number of PVAE training parameters. The experimental results show that the proposed method can acquire better speech and noise latent representation than PVAE. Meanwhile, it also obtains a higher scale-invariant signal-to-distortion ratio, speech quality, and speech intelligibility.Comment: Submitted to Eurosipc

arXiv.org e-Print Archive

VBN

iSocioBot;:A Multimodal Interactive Social Robot

Author: Duan Xiaodong
Højvang Jesper Lisby
Rasmussen Morten Højfeldt
Shepstone Sven Ewan
Tan Zheng-Hua
Thomsen Nicolai Bæk
Vlachos Evgenios
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

VBN